智能论文笔记

Edge-Enhanced Dual Discriminator Generative Adversarial Network for Fast MRI with Parallel Imaging Using Multi-view Information

Jiahao Huang , Weiping Ding , Jun Lv , Jingwen Yang , Hao Dong , Javier Del Ser , Jun Xia , Tiaojuan Ren , Stephen Wong , Guang Yang

分类：人工智能 | 计算机视觉 | 机器学习

2021-12-10

在临床医学中，磁共振成像（MRI）是诊断，分类，预后和治疗计划中最重要的工具之一。然而，MRI遭受了固有的慢数据采集过程，因为数据在k空间中顺序收集。近年来，大多数MRI重建方法在文献中侧重于整体图像重建而不是增强边缘信息。这项工作通过详细说明了对边缘信息的提高来阐述了这一趋势。具体地，我们通过结合多视图信息介绍一种用于快速多通道MRI重建的新型并行成像耦合双鉴别器生成的对抗网络（PIDD-GaN）。双判别设计旨在改善MRI重建中的边缘信息。一个鉴别器用于整体图像重建，而另一个鉴别器是负责增强边缘信息的负责。为发电机提出了一种具有本地和全局剩余学习的改进的U-Net。频率通道注意块（FCA块）嵌入在发电机中以结合注意力机制。引入内容损耗以培训发电机以获得更好的重建质量。我们对Calgary-Campinas公共大脑MR DataSet进行了全面的实验，并将我们的方法与最先进的MRI重建方法进行了比较。在MICCAI13数据集上进行了对剩余学习的消融研究，以验证所提出的模块。结果表明，我们的PIDD-GaN提供高质量的重建MR图像，具有良好的边缘信息。单图像重建的时间低于5ms，符合加快处理的需求。

translated by 谷歌翻译

Robust Weakly Supervised Learning for COVID-19 Recognition Using Multi-Center CT Images

Qinghao Ye , Yuan Gao , Weiping Ding , Zhangming Niu , Chengjia Wang , Yinghui Jiang , Minhao Wang , Evandro Fei Fang , Wade Menpes-Smith , Jun Xia

分类：计算机视觉 | 机器学习

2021-12-09

世界目前正在经历持续的传染病大流行病，该传染病是冠状病毒疾病2019（即covid-19），这是由严重的急性呼吸综合征冠状病毒2（SARS-COV-2）引起的。计算机断层扫描（CT）在评估感染的严重程度方面发挥着重要作用，并且还可用于识别这些症状和无症状的Covid-19载体。随着Covid-19患者的累积数量的激增，放射科医师越来越强调手动检查CT扫描。因此，自动化3D CT扫描识别工具的需求量高，因为手动分析对放射科医师耗时，并且它们的疲劳可能导致可能的误判。然而，由于位于不同医院的CT扫描仪的各种技术规范，CT图像的外观可能显着不同，导致许多自动图像识别方法的失败。因此，多域和多扫描仪研究的多域移位问题是不可能对可靠识别和可再现和客观诊断和预后至关重要的至关重要。在本文中，我们提出了Covid-19 CT扫描识别模型即Coronavirus信息融合和诊断网络（CIFD-NET），可以通过新的强大弱监督的学习范式有效地处理多域移位问题。与其他最先进的方法相比，我们的模型可以可靠，高效地解决CT扫描图像中不同外观的问题。

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Yahao Ding , Zhaohui Yang , Quoc-Viet Pham , Zhaoyang Zhang , Mohammad Shikh-Bahaei

分类：机器学习 | 人工智能

2023-01-03

Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.

translated by 谷歌翻译

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

Jianzong Wu , Xiangtai Li , Henghui Ding , Xia Li , Guangliang Cheng , Yunhai Tong , Chen Change Loy

分类：计算机视觉

2023-01-02

In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.

translated by 谷歌翻译

Curvature regularization for Non-line-of-sight Imaging from Under-sampled Data

Rui Ding , Juntian Ye , Qifeng Gao , Feihu Xu , Yuping Duan

分类：计算机视觉

2023-01-01

Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight, which uses photon time-of-flight information encoded in light after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is of high possibility to be degraded due to noises and distortions. In this paper, we propose two novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (i.e., signal and object)-domain curvature regularization model. Fast numerical optimization algorithms are developed relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, which are further accelerated by GPU implementation. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.

translated by 谷歌翻译

How would Stance Detection Techniques Evolve after the Launch of ChatGPT?

Bowen Zhang , Daijun Ding , Liwen Jing

分类：自然语言处理

2022-12-30

Stance detection refers to the task of extracting the standpoint (Favor, Against or Neither) towards a target in given texts. Such research gains increasing attention with the proliferation of social media contents. The conventional framework of handling stance detection is converting it into text classification tasks. Deep learning models have already replaced rule-based models and traditional machine learning models in solving such problems. Current deep neural networks are facing two main challenges which are insufficient labeled data and information in social media posts and the unexplainable nature of deep learning models. A new pre-trained language model chatGPT was launched on Nov 30, 2022. For the stance detection tasks, our experiments show that ChatGPT can achieve SOTA or similar performance for commonly used datasets including SemEval-2016 and P-Stance. At the same time, ChatGPT can provide explanation for its own prediction, which is beyond the capability of any existing model. The explanations for the cases it cannot provide classification results are especially useful. ChatGPT has the potential to be the best AI model for stance detection tasks in NLP, or at least change the research paradigm of this field. ChatGPT also opens up the possibility of building explanatory AI for stance detection.

translated by 谷歌翻译

A polynomial time iterative algorithm for matching Gaussian matrices with non-vanishing correlation

Jian Ding , Zhangsong Li

分类： (统计)机器学习

2022-12-28

Motivated by the problem of matching vertices in two correlated Erd\H{o}s-R\'enyi graphs, we study the problem of matching two correlated Gaussian Wigner matrices. We propose an iterative matching algorithm, which succeeds in polynomial time as long as the correlation between the two Gaussian matrices does not vanish. Our result is the first polynomial time algorithm that solves a graph matching type of problem when the correlation is an arbitrarily small constant.

translated by 谷歌翻译

Online Learning for Adaptive Probing and Scheduling in Dense WLANs

Tianyi Xu , Ding Zhang , Zizhan Zheng

分类：机器学习 | 人工智能

2022-12-27

Existing solutions to network scheduling typically assume that the instantaneous link rates are completely known before a scheduling decision is made or consider a bandit setting where the accurate link quality is discovered only after it has been used for data transmission. In practice, the decision maker can obtain (relatively accurate) channel information, e.g., through beamforming in mmWave networks, right before data transmission. However, frequent beamforming incurs a formidable overhead in densely deployed mmWave WLANs. In this paper, we consider the important problem of throughput optimization with joint link probing and scheduling. The problem is challenging even when the link rate distributions are pre-known (the offline setting) due to the necessity of balancing the information gains from probing and the cost of reducing the data transmission opportunity. We develop an approximation algorithm with guaranteed performance when the probing decision is non-adaptive, and a dynamic programming based solution for the more challenging adaptive setting. We further extend our solutions to the online setting with unknown link rate distributions and develop a contextual-bandit based algorithm and derive its regret bound. Numerical results using data traces collected from real-world mmWave deployments demonstrate the efficiency of our solutions.

translated by 谷歌翻译

Semi-Supervised Semantic Segmentation Methods for UW-OCTA Diabetic Retinopathy Grade Assessment

Zhuoyi Tan , Hizmawati Madzin , Zeyu Ding

分类：计算机视觉

2022-12-27

People with diabetes are more likely to develop diabetic retinopathy (DR) than healthy people. However, DR is the leading cause of blindness. At present, the diagnosis of diabetic retinopathy mainly relies on the experienced clinician to recognize the fine features in color fundus images. This is a time-consuming task. Therefore, in this paper, to promote the development of UW-OCTA DR automatic detection, we propose a novel semi-supervised semantic segmentation method for UW-OCTA DR image grade assessment. This method, first, uses the MAE algorithm to perform semi-supervised pre-training on the UW-OCTA DR grade assessment dataset to mine the supervised information in the UW-OCTA images, thereby alleviating the need for labeled data. Secondly, to more fully mine the lesion features of each region in the UW-OCTA image, this paper constructs a cross-algorithm ensemble DR tissue segmentation algorithm by deploying three algorithms with different visual feature processing strategies. The algorithm contains three sub-algorithms, namely pre-trained MAE, ConvNeXt, and SegFormer. Based on the initials of these three sub-algorithms, the algorithm can be named MCS-DRNet. Finally, we use the MCS-DRNet algorithm as an inspector to check and revise the results of the preliminary evaluation of the DR grade evaluation algorithm. The experimental results show that the mean dice similarity coefficient of MCS-DRNet v1 and v2 are 0.5161 and 0.5544, respectively. The quadratic weighted kappa of the DR grading evaluation is 0.7559. Our code will be released soon.

translated by 谷歌翻译